Introduction
Introduction to Reinforcement Learning
-
Reinforcement Learning (RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.
-
What distinguishes RL from other machine learning techniques,
- RL does not require labeled data.
- RL learns from the consequences of its actions.
- RL learns from the feedback it receives from the environment, which is delayed and not immediate.
- RL learns to make a sequence of decisions to achieve a long-term goal. (sequental, non independent and identically distributed data)
Reinforcement Learning Terminology:
- Agent: The learner or decision-maker that interacts with the environment.
- Environment: The external system with which the agent interacts.
- State (s): A representation of the environment at a given time.
- Action (a): A decision taken by the agent to transition from one state to another.
- Reward (r): A scalar feedback signal from the environment to the agent.
- Policy (Ï€): A strategy or a rule that the agent follows to select actions.
- Value Function (V): The expected cumulative reward of following a policy from a given state. A prediction of future rewards. Used to evaluate the goodness of states.
Definition: All goals can be framed as the maximization of the expected cumulative reward.
Goal is to select actions to maximize the total future reward. Actions may affect not only the immediate reward but also the future rewards. The agent learns to achieve a balance between immediate and future rewards.
Definition: The history is the sequence of observations, actions, rewards, and states.
- History:
Definition: The state is a function of the history.
- State:
Full observability: If the agent's sensors give it access to the complete state of the environment, then the environment is said to be fully observable.
Partial observability: If the agent's sensors give it access to only a partial state of the environment, then the environment is said to be partially observable.
A model predicts what the environment will do next. It is a simulation of the environment. The model can be used for planning and learning.
Transitions:
Rewards:
Categorizing RL Agents
-
Value-based RL: The agent learns a value function that estimates how good it is to be in a particular state.
- No policy is explicitly learned.
- Value Function:
-
Policy-based RL: The agent learns a policy that directly maps states to actions.
- No value function is explicitly learned.
- Policy:
-
Actor-critic RL: The agent learns both a policy and a value function.
- Policy:
- Value Function:
- Policy:
-
Model-based RL: The agent learns a model of the environment.
- The model is used for planning and learning.
- Transitions:
- Rewards:
- Model:
-
Model-free RL: The agent learns directly from the environment without a model.
- No model is learned.
- No planning is done.
- Value Function:
- Policy:
Exploration vs. Exploitation
- Exploration: The agent explores the environment to find the best actions.
- Exploitation: The agent exploits the known information to maximize the reward.
Prediction vs. Control
- Prediction: Given a policy, compute the value function.
- Control: Find the optimal policy.
#MMI706 - Reinforcement Learning at METU